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(54) An audio processing system 



(57) An audio processing system (1) has real time 

recording and logging computers (10,15) which record 
sound and annotation text in real time. A sound file (SF) 
and a log file (LP) are associated with each take. The 
logging computer (15) uses automatic entries to a tag 
file on a sender (12) to maintain synchronism between 
the log and sound files. Timeline references are embed- 
ded as text within the log files. A transcription worksta- 



tion (1 6) retrieves the corresponding sound and log files 
and the log file may be exported to a word processing 
applicatbn with the embedded time line references to 
provide a template. The transcription workstation (16) 
correlates time with digital data strings to alk)w simple 
payt}ack using foot pedals in a conventkniai manner. It 
also albws selection of text using a graphical tool with 
automata searching to the associated sound file seg- 
ment. 
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Description 

[0001] The invention relates to an audio processing 
system for processing of audio proceedings of a setting 
sucli as a parliament or a Court chamber. 
[00021 Unites States Patent Specification No. 
5878186 describes such a system. A computer aided 
transcription (CAT) system operates in Virtual realtime" 
to generate a textual record of the proceedings. A court 
reporter produces 'transition markers' at changes of 
speakers. Because of pauses and court report time lags 
for transition markers, the text is often out of synchroni- 
sation with the audk) signals. Further analysis is per- 
formed to achieve synchronisation by kJentifying a dif- 
ferent time period. 

[0003] United States Patent Specificatk)n No. 
4924387 (Jeppesen) also describes use of a steno- 
graphic device and a controller which separates court 
reporter keystroke combinations into phonetic and con- 
trol keystrokes. Unites States Patent Specificatbn No. 
5272571 (L. R. Linn and Associates) also describes si- 
multaneous recording of keystrokes and audio data. It 
also describes later transcript bn. During processing, a 
table is generated which links stenotype keystrokes with 
fields, audio file pointers, and corresponding text 
strings. 

[0004] Such systems provide a good deal of textual 
information in near real time. However, where compre- 
hensive editing is required the systems are quite inflex- 
ible. Also, it appears that a large extent of manpower is 
required during the audk> proceedings. 
[0005] The invention is therefore directed towards 
providing a system whk:h both captures audb proceed- 
ings in realtime in a comprehensive manner, and which 
altows comprehensive and flexible editing facilities. 
[0006] According to the Invention, there Is provided, 
an audio processing system for generation of transcripts 
from audio proceedings, the system comprising capture 
means for recording audio and corresponding textual 
data and transcription means for generating transcripts, 
characterised in that, 

the capture means comprises a logging means 
comprising means for generating a textual log file 
for audio proceedings and for embedding timeline 
references In the k>g file; and 

the transcription means comprises means for read- 
ing the embedded timeline references, for correlat- 
ing the timeline references to positbn data in an au- 
dio sound file, and for automatically locating and 
playing back sound from the sound file in response 
to selection of indicia indicating timeline references 
in the k^ file. 

[0007] In one embodiment, the capture means com- 
prises a recording means comprising means for record- 
ing the audio proceedings in the sound file. 



[0008] Preferably, the recording means comprises 
means for simultaneously generating timeline data and 
making said data available to the logging means. 
[0009] In one embodiment, the recording means com- 

^ prises means for automatically generating the sound file 
on a sen/er and for updating the sound file during the 
audio proceedings, and for writing said timeline data to 
a tag file on the server, and the logging means compris- 
es means for performing reads from the tag file on the 

iO server during generation of the log file. 

[0010] Preferably, the recording means comprises 
means for automatically writing a current sound file 
name and an activity flag indicating if audio proceedings 
are about to begin, have begun, or have stopped, and 
the logging means comprises means for reading said 
flag and operating accordingly- 
[0011] In another embodiment, the recording means 
comprises means for writing an overlap flag to the tag 
file to indk^te if the current sound is in an overlap perkxi 

^0 between sound takes. 

[0012] In a further embodiment, the logging means 
comprises means for automatically embedding timeline 
references upon input of a new speaker identity In the 
k>g file annotatbn inputs. 

[0013] Preferably, the logging means comprises 
means for recording initial spoken words after identifi- 
cation data for a new speaker 
[001 4] In a further embodiment, the timeline referenc- 
es are represented as a symbol at the start of a display 
^ text line, the timeline reference being expanded upon 
selection of the symbol. 

[0015] In another embodiment, the transcription 
means comprises means for exporting the log file to a 
word processor application and for playing back the 

W sound file at a position selected by a user using the word 
processor applk»tk)n for transcript editing. 
[001 6] Preferably, the transcription means comprises 
for automatically displaying a time counter for the cur- 
rent sound file position to allow user input of a timeline 

40 reference in a k>g file or in a transcript file. 

[0017] The inventkxi will be more clearly understood 
from the folk>wing description of some embodiments 
thereof, given by way of example only with reference to 
the accompanying drawings in which:- 

Fig. 1 is a schematic representation of an audk) 
processing system of the Inventkxi; 

Fig. 2 is a representatk>n of a tag file generated with- 
^ in the system; 

Fig. 3 is a representatran of part of a tog file gener- 
ated within the system; and 

^ Fig. 4 is a representation of a part of a transcript 
generated by the system. 

[0018] Refening to the drawings, there is shown an 
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audio processing system 1 of the inventron. The system 
1 comprises a real time section 2 and an off-line tran- 
scription section 3. 

[0019] A local area network 4 interconnects the vari- 
ous processing devices. These include a recording 
computer 10 which receives audio Inputs from micro- 
phones 1 1 . The recording computer 1 0 accesses a serv- 
er 1 2 via the nelworl< 4. The sender 1 2 has an audio stor- 
age area 1 3 and a transcript storage area 1 4. 
[0020] A logging computer 15 receives manual text 
annotation inputs arKi it also communicates with the 
sender 12. The recording computer 10 and the logging 
computer 15 are the primary real time devices within the 
system and both access the server 12 and indeed ac- 
cess the same folders or directories within the server 1 2 
storage structures. 

[0021 ] The off-line section 3 comprises a transcription 
workstation 16, an editing workstation 17, a transcript 
printer 1 8, and a modem 1 9. The printer 18 is for printing 
of prepared transcripts of audio proceedings, and the 
modem 19 is for remote communk:ation of transcripts. 
The network is in this embodiment a local area network, 
however, the various devices may be distributed more 
widely usin^ wkJe area network technology. Thus, the 
basic design of the system 1 is very flexible. 
[0022] The system 1 may be used for generatbn of 
transcripts of audio proceedings in a pariiament chanrv- 
ber or in a court, for. example. For illustrative purposes, 
it is presumed that the audio proceedings are in a par- 
liament chamber. These proceedings are broken into 
takes' each in this embodiment 10 minutes long. The 
recording computer 10 and the logging computer 15 op- 
erate in real time to capture data and log it into the server 
12. 

[0023] The recording computer 1 0 sets up a sound file 
(SF) on the sender 12 in a particular foMer within the 
audk) storage area 1 3, the folder being associated with 
a partk^ular session of audio proceedings such as one 
parliamentary day. A sound file is configured for storing 
digital audio data in a conventk>nal fonmat such as a 
wav" format- The recording computer 10 also has a 
clock which is reset at the start of the take and this is 
used to generate entries for a tag file whk^h is also set 
up on the sender 12. The tag file is continuously updated 
and is not associated with any particular take, but is as- 
sociated with an audio proceedings sessk)n. 
[0024] In Fig. 2, a tag file is indicated by the numeral 
30 and in this example it comprises the fotbwing data 
fiekls:- 

A: The take name. These run in alphabetical se- 
quence. 

137: This is a timeline reference, namely a time 
stamp of 1 37 seconds into the take. 

Flags: A first flag indrcates with a "0" that there is 
currently no overlap, and with a '1 ' that there is cur- 



rently overlap between takes. A second flag Indi- 
cates with the word "set" that a take is about to be- 
gin, with the word "run" that a take is currently ac- 
tive, and with a word "stop" that a take has erKied. 

The start time of the take, in this case 15:17:00. 

[0025] The recording computer 10 operates a timer 
which correlates one second segments to eight kB of 
audio data. The recording computer 1 0 outputs a tag file 
update at six second intervals and so typically the time- 
line reference increments by six seconds with every up- 
date. Also, of course, the recording computer 1 0 outputs 
the sound bytes. The frequency for this output is one 
second segments, each with 8 kB. 
[0026] At the same time, the logging computer 1 5 op- 
erates in real time to generate a log file (LF) correspond- 
ing to the partrcular sound file (SF) in a one-to-one re- 
lationship. The log file is stored in the same folder or 
directory on the sender audio storage area 1 3. It is also 
kJentified by the same name ("A") as the corresponding 
sound tile. The logging computer 1 5 receives text anno- 
tation inputted by a reporter. A sample 35 is illustrated 
in Fig. 3. The logging computer 1 5 polls the tag file every 
two seconds until it detects a "run" flag, upon wfhich it 
automatk^ally sets up a corresponding log tile and stores 
it in the audio storage area 13. The log tile is then up- 
dated with a new line giving a speaker identity and the 
tirst few words. In this embodiment, an update is per- 
formed for every "Can-iage Return" keystroke. 
[0027] The tag tile 30 acts as a dynamic link between 
the recording and logging computers 10 and 15 and this 
link is maintained as long as the audio proceedings con- 
tinue. Switch over to a new take is flagged by the overlap 
flag "0/1 " and the overlap period in this embodiment is 
15 seconds. Thus, each sound file records 15 secorKis 
of the next take which provides sufficient time for roll- 
over. However, this time is user-contigurable to a de- 
sired setting. By simply nrxxiitoring the tag tile, the log- 
ging computer 15 generates the appropriate k>gging 
files in real time and records the relevant text annota- 
tions. Eventually, the "stop" flag will be written to the tag 
tile, upon whk;h the togging computer 15 stops gener- 
ating new kDg tiles. 

[002S] At the end of the session, there is a set of tog 
and sound files, there being a single bg tile and a single 
sound tile associated with each individual take. An im- 
portant aspect of generation of the log tile is that the 
logging computer 15 embeds a timeline reference in the 
tog tile. Examples 36 are shown in Fig. 3. These three 
timeline referertces are <0>, <16>, and <22>. This in- 
formation is available to the k>gging computer from the 
tag file and it is automattoally embedded as text together 
with the text annotattons which are inputted. 
[0029] When it is desired to generate a transcript for 
the audto proceedings session, the transcriptton work- 
station 16 and the editing workstation 17 are operated. 
The transcription workstation 16 retrieves the sound and 
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log files for the session and opens corresponding sound 
and log files simultaneously. The transcription worksta- 
tion 16 has foot pedals which are equivalent to those of 
conventional audio tape dictating playback machines, 
namely Rewind, Play, and Fon^^ard. An operator oper- 
ates these pedals in a manner equivalent to using a dlc- 
tatkxi and playback machine. Again, the transcription 
computer 16 associates a one second time perbd with 
8 kB of sound data and it "rewinds' or fast forwards" 
through the relevant sound file by 8kB for each second 
of depresskxt of the Rewind or Fast-Forward pedals. Al- 
so, the transcription workstation 16 recognises key- 
board or mouse selectbns of lines of text such as one 
of the lines illustrated In Fig. 3. The embedded timeline 
reference allows It locate the relevant position In the cor- 
responding sound file. For example, the third line illus- 
trated in Fig. 3 is 22 seconds Into the take and therefore 
the workstatk)n starts playback at byte number 176 kB 
in the sound file. In this way, the operator can type In 
text as he or she listens to playback of the sound file. 
The kx:ation within the log file for typing the text is indi- 
cated by the name of the speaker followed by the first 
few words which are spoken. 

[0030] Th is workstation also operates to cut and paste 
the log file into a third-party word processing applk;ation. 
Indeed, cutting and pasting of k>g files into a word proc- 
essor applicatbn allows creation of an initial template 
for generation of a transcript. An important aspect of the 
embedded tags within the log file are that they are in a 
text format whk;h is ported with the other text into the 
word processor template. A sample template 40 Is 
shown in Fig. 4. As the workstation 16 operates. It not 
only displays the text, but also displays either a symbol 
Indicating the timeline reference or the reference itself. 
The symbol may be a dot such as a dot 41 at the start 
of the line to indbate that expansion of this dot displays 
the timeline reference. During execution of the transcrip- 
tion programs, the workstatbn 16 automatrcally displays 
the time counter for the current text. This is indicated as 
00.16 In the example of Fig. 3. Thus, an operator who 
has both the transcription word processor file and the 
log file open can manually input a timeline reference in 
order to make these references more comprehensive in 
the transcript. These references are particularly impor- 
tant also for the editing workstatbn 17 as additional op- 
erators may provkJe an input into a particular transcript, 
depending on their particular transcript skills. It must be 
borne in mind that parliamentary transcription is a highly 
skilled task and the system 1 altows input from a number 
of people in a simple and highly controlled and struc 
tured manner. 

[0031] It will be appreciated that the invention pro- 
vides for generatbn of a transcript with very accurate 
real time recording of sound and annotation data and 
interlinking of sound, bgging, and transcript files in a 
manner whereby comprehensive and accurate tran- 
scripts may be generated with any required extent of ed- 
iting and Input from different people. At the same time. 



the system also allows re-checking back to the sound 
files in a highly organised and efficient manner. Even 
the final transcript product is correlated back tothe audio 
recording which was captured live, 
s [0032] The inventkxi is not limited tothe embodiments 
described but may be varied in construction and detail 
within the scope of the claims. 



to Claims 



An audk> processing system (1) for generation of 
transcripts from audk) proceedings, the system (1 ) 
comprising capture means for recording audk) and 
corresponding textual data and transcrlptk>n means 
for generating transcripts, characterised in that, 

the capture means comprises a bgging means 
(15) comprising means for generating a textual 
log file (LF) for audio proceedings and for em- 
bedding timeline references In the bg file; and 

the transcription means (16) comprises means 
for reading the embedded timeline references, 
for correlating the timeline references to posi- 
tion data In an audb sound file (SF), and for 
automatically locating and playing back sound 
from the sound file in response to setectk^n of 
Indicia Indicating timeline references in the log 
file. 

A system as claimed in claim 1 , wherein the capture 
means comprises a recording means (10) compris- 
ing means for recording the audio proceedings in 
the sound file (SF). 

A system as claimed in claim 2, wherein the record- 
ing means (10) comprises means for simultaneous- 
ly generating timeline data and making sakl data 
available to the bgging means. 

A system as claimed in claim 3, wherein the record- 
ing means (10) comprises means for automatk:ally 
generating the sound file on a sers/er and for updat- 
ing the sound file during the audio proceedings, and 
for writing said timeline data to a tag file on the send- 
er, and the logging means comprises means for per- 
forming reads from the tag file on the server during 
generation of the log file. 

A system as claimed in claim 4, wherein the record- 
ing means comprises means for automatically writ- 
ing a current sound file name and an activity flag 
indicating if audb proceedings are about to begin, 
have begun, or have stopped, and the bgging 
means comprises means for reading said flag and 
operating accordingly. 
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6. A system as claimed in ciaims 4 or 5, wherein the 
recording means comprises means for writing an 
overlap flag to the tag file to Indicate if the current 
sound is in an overlap period between sound takes. 

s 

7. A system as claimed in any preceding claim, where- 
in the bggrng means comprises means for automat- 
ically embedding timeline references upon input of 
a new speaker klentity in the log file annotatbn in- 
puts. iO 

8. A system as claimed in any preceding claim, where- 
in the logging means comprises means for record- 
ing initial spoken words after kJentification data for 
a new speaker. 



9. A system as claimed in any preceding claim, where- 
in the timeline references are represented as a sym- 
bo\ at the start of a display text line, the timeline ref- 
erence being expanded upon selection of the sym- 
bol. 



1 0. A system as claimed in any preceding claim, where- 
in the transcription means (16) comprises means 
for exporting the log file to a word processor appli- 
catk>n and for playing back the sound file at a posi- 
tbn selected by a user using the word processor 
applk:ation for transcript editing. 



11. A system as claimed in any preceding claim, where- ^ 
in the transcription means comprises for automati- 
cally displaying a time counter for the current sound 

file position to allow user input of a timeline refer- 
ence in a k>g file or In a transcript file. 

12. A system substantially as described with reference 
to the accompanying drawings. 



13. A computer program product directly loadable into 
the internal memory of a digital computer, and com- 
prising software code for implementing the capture 
means and the transcription means when sakJ prod- 
uct is run on a digital computer. 
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A 137 0 run 15:17:00 

[take name] [time line reference] [Flags] [take start 

time] 

Fig. 2 
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<0> Mr. A: The Minister will be aware 
<16> Minister: The legislation Report Stage 
<22> Chairman; The matter will 
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* Mr. A: The minister will be aware that we have been making 

representations in relation to this matter for some time. 

* Minister: The legislation Report Stage is due next week. It deals with 

all of the matters under review. 

* Chainnan: The matter will be discussed in the Chamber in full detail 

when the Report Stage is reached. 



Fig. 4 



